feat: enhance parseRemoteAuthority to handle domains containing "--"#917
feat: enhance parseRemoteAuthority to handle domains containing "--"#917ntoofu wants to merge 1 commit intocoder:mainfrom
Conversation
|
Nice fix for the punycode case. Before this lands, one realistic case still misparses, and a slightly simpler shape closes it. Where can
|
| Input | Result |
|---|---|
coder-vscode--foo--bar |
3-part, prefix coder-vscode |
coder-vscode--foo--bar--baz |
4-part, agent baz |
coder-vscode.dev.coder.com--foo--bar |
3-part |
coder-vscode.dev.coder.com--foo--bar--baz |
4-part wins (prefix valid, no .xn) |
coder-vscode.dev.coder.com--foo--bar.baz |
3-part with dot-agent |
coder-vscode.coder.xn--eckwd4c7cu47r2wf.jp--foo--bar (this PR's case) |
i=1 prefix invalid (ends .xn); i=2 accepts |
coder-vscode.xn--p1ai--owner--ws (apex punycode, currently broken) |
i=1 prefix invalid (ends .xn); i=2 accepts |
coder-vscode.xn--abc.xn--def.com--owner--ws (multiple punycode labels) |
i=1 and i=2 invalid; i=3 accepts |
False negatives
The only input this can reject incorrectly is a deployment whose domain ends in a label literally named xn (e.g. something.xn). IANA reserves the xn-- space, url.domainToASCII does not produce bare xn followed by separator content, and no real TLD or registrable label of xn alone exists. So the failure mode is a label that essentially cannot legally be reached.
Side benefit
The slug regex also rejects empty names and names ending in - that the current parser passes through, so input validation tightens up at the same time.
Happy to push this as a follow-up commit on this branch.
Problem
As explained in CONTRIBUTING.md,
The parser splits on "--" to extract each component, but domain labels may contain "--".
In practice, IDNA/Punycode-encoded domain labels use
xn--as the ACE prefix.So the naive split("--") misparses hostnames with such labels, and causes errors like the following image.
For example, a Coder deployment
coder.xn--eckwd4c7cu47r2wf.jpwould produce an SSH host ofcoder-vscode.coder.xn--eckwd4c7cu47r2wf.jp--yourname--your-workspace.When split on "--", this yields
["coder-vscode.coder.xn", "eckwd4c7cu47r2wf.jp", "yourname", "your-workspace"]but the parser expects the first element to be the full hostname, so vscode-coder would try to open a workspaceyournamewhose owner iseckwd4c7cu47r2wf.jpand fails.Solution
Added a reassembly step after the initial split("--").
The algorithm scans from the front of the split result:
This works because Coder usernames never contain dots,
so any dot-bearing segment after the split must be a hostname fragment.
Note: a TLD containing "--" would not be handled correctly because such hostname would fragment into segments the last of which does not contain dot, but no such TLD exists today.